Chris: For those who have not heard of it before, can you tell us more about the School of Informatics at Sk vde University?
Simon: Certainly, but I may be a little long-winded. Sk vde is a city in the area between the two large lakes in southern Sweden. The city is a busy place. Sk vde is home to the regional hospital, some of Volvo s manufacturing facilities, two regiments of the Swedish defence force, a lot of businesses in the Swedish computer games industry, other tech companies and more.
The University of Sk vde is relatively small. Sweden s large land area and low population density mean that regional centres such as Sk vde are important and local universities support businesses by training new staff and supporting innovation.
The School of Informatics has two divisions. One focuses on teaching and researching computer games. The other division encompasses a wider range of teaching and research, including computer science, web development, computer security, network administration, data science and so on.
Chris: You recently had a open-access paper published in Software Quality Journal. Could you tell us a little bit more about it and perhaps briefly summarise its key findings?
Simon: The paper is one output of a collaborative research project with six Swedish businesses that use open source software. There are two parts to the paper. The first consists of an analysis of what the group of businesses in the project know about Reproducible Builds (R-Bs), their experiences with R-Bs and their perception of the value of R-Bs to the businesses. The second part is an interview study with business practitioners and others with experience and expertise in R-Bs.
We set out to try to understand the extent to which software-intensive businesses were aware of R-Bs, the technical and business reasons they were or were not using R-Bs and to document the business and technical use cases for R-Bs. The key findings were that businesses are aware of R-Bs, and some are using R-Bs as part of their day-to-day development process. Some of the uses for R-Bs we found were not previously documented. We also found that businesses understood the value R-Bs have as part of engineering and software quality processes. They are also aware of the costs of implementing R-Bs and that R-Bs are an intangible value proposition - in other words, businesses can add value through process improvement by using R-Bs. But, that, currently at least, R-Bs are not a selling point for software or products.
Chris: You performed a large number of interviews in order to prepare your paper. What was the most surprising response to you?
Simon: Most surprising is a good question. Everybody I spoke to brought something new to my understanding of R-Bs, and many responses surprised me. The interviewees that surprised me most were I01 and I02 (interviews were anonymised and interviewees were assigned numeric identities).
I02 described the sceptical perspective that there is a viable, pragmatic alternative to R-Bs - verifiable builds - which I was aware of before undertaking the research. The company had developed a sufficiently robust system for their needs and worked well. With a large archive of software used in production, they couldn t justify the cost of retrofitting a different solution that might only offer small advantages over the existing system.
Doesn t really sound too surprising, but the interview was one of the first I did on this topic, and I was very focused on the value of, and need for, trust in a system that motivated the R-B. The solution used by the company requires trust, but they seem to have established sufficient trust for their needs by securing their build systems to the extent that they are more or less tamper-proof.
The other big surprise for me was I01 s use of R-Bs to support the verification of system configuration in a system with multiple embedded components at boot time. It s such an obvious application of R-Bs, and exactly the kind of response I hoped to get from interviewees. However, it is another instance of a solution where trust is only one factor. In the first instance, the developer is using R-Bs to establish trust in the toolchain. There is also the second application that the developer can use a set of R-Bs to establish that deployed system consists of compatible components. While this might not sound too significant, there appear to be some important potential applications. One that came to mind immediately is a problem with firmware updates on nodes in IoT systems where the node needs to update quickly with limited downtime and without failure. The node also needs to be able to roll back any update proposed by a server if there are conflicts with the current configuration or if any tests on the node fail. Perhaps the chances of failure could be reduced, if a node can instead negotiate with a server to determine a safe path to migrate from its current configuration to a working configuration with the upgraded components the central system requires? Another potential application appears to be in the configuration management of AI systems, where decisions need to be explainable. A means of specifying validated configurations of training data, models and deployed systems might, perhaps, be leveraged to prevent invalid or broken configurations from being deployed in production.
Chris: One of your findings was that reproducible builds were perceived to be good engineering practice . To what extent do you believe cultural forces affect the adoption or rejection of a given technology or practice?
Simon: To a large extent. People s decisions are informed by cultural norms, and business decisions are made by people acting collectively. Of course, decision-making, including assessments of risk and usefulness, is mediated by individual positions on the continuum from conformity to non-conformity, as well as individual and in-group norms. Whether a business will consider a given technology for adoption will depend on cultural forces. The decision to adopt may well be made on the grounds of cost and benefits.
Chris: Another conclusion implied by your research is that businesses are often dealing with software deployment lifespans (eg. 20+ years) that differ from widely from those of the typical hobbyist programmer. To what degree do you think this temporal mismatch is a problem for both groups?
Simon: This is a fascinating question. Long-term software maintenance is a requirement in some industries because of the working lifespans of the products and legal requirements to maintain the products for a fixed period. For some other industries, it is less of a problem. Consequently, I would tend to divide developers into those who have been exposed to long-term maintenance problems and those who have not. Although, more professional than hobbyist developers will have been exposed to the problem. Nonetheless, there are areas, such as music software, where there are also long-term maintenance challenges for data formats and software.
Chris: Based on your research, what would you say are the biggest blockers for the adoption of reproducible builds within business ? And, based on this, would you have any advice or recommendations for the broader reproducible builds ecosystem?
Simon: From the research, the main blocker appears to be cost. Not an absolute cost, but there is an overhead to introducing R-Bs. Businesses (and thus business managers) need to understand the business case for R-Bs.
Making decision-makers in businesses aware of R-Bs and that they are valuable will take time. Advocacy at multiple levels appears to be the way forward and this is being done. I would recommend being persistent while being patient and to keep talking about reproducible builds. The work done in Linux distributions raises awareness of R-Bs amongst developers. Guix, NixOS and Software Heritage are all providing practical solutions and getting attention - I ve been seeing progressively more mentions of all three during the last couple of years. Increased awareness amongst developers should lead to more interest within companies. There is also research money being assigned to supply chain security and R-B s. The CHAINS project at KTH in Stockholm is one example of a strategic research project. There may be others that I m not aware of. The policy-level advocacy is slowly getting results in some countries, and where CISA leads, others may follow.
Chris: Was there a particular reason you alighted on the question of the adoption of reproducible builds in business? Do you think there s any truth behind the shopworn stereotype of hacker types neglecting the resources that business might be able to offer?
Simon: Much of the motivation for the research came from the contrast between the visibility of R-Bs in open source projects and the relative invisibility of R-Bs in industry. Where companies are known to be using R-Bs (e.g. Google, etc.) there is no fuss, no hype. They were not selling R-Bs as a solution; instead the documentation is very matter-of-fact that R-Bs are part of a customer-facing process in their cloud solutions.
An obvious question for me was that if some people use R-B s in software development, why doesn t everybody? There are limits to the tooling for some programming languages that mean R-Bs are difficult or impossible. But where creating an R-B is practical, why are they not used more widely?
So, to your second question. There is another factor, which seems to be more about a lack of communication rather than neglecting opportunities. Businesses may not always be willing to discuss their development processes and innovations. Though I do think the increasing number of conferences (big and small) for software practitioners is helping to facilitate more communication and greater exchange of ideas.
Chris: Has your personal view of reproducible builds changed since before you embarked on writing this paper?
Simon: Absolutely! In the early stages of the research, I was interested in questions of trust and how R-Bs were applied to resolve build and supply chain security problems. As the research developed, however, I started to see there were benefits to the use of R-Bs that were less obvious and that, in some cases, an R-B can have more than a single application.
Chris: Finally, do you have any plans to do future research touching on reproducible builds?
Simon: Yes, definitely. There are a set of problems that interest me. One already mentioned is the use of reproducible builds with AI systems. Interpretable or explainable AI (XAI) is a necessity, and I think that R-Bs can be used to support traceability in the configuration and testing of both deployed systems and systems used during model training and evaluation.
I would also like to return to a problem discussed briefly in the article, which is to develop a deeper understanding of the elements involved in the application of R-Bs that can be used to support reasoning about existing and potential applications of R-Bs. For example, R-Bs can be used to establish trust for different groups of individuals at different times, say, between remote developers prior to the release of software and by users after release. One question is whether when an R-B is used might be a significant factor. Another group of questions concerns the ways in which trust (of some sort) propagates among users of an R-B. There is an example in the paper of a company that rebuilds Debian reproducibly for security reasons and is then able to collaborate on software projects where software is built reproducibly with other companies that use public distributions of Debian.
Chris: Many thanks for this interview, Simon. If someone wanted to get in touch or learn more about you and your colleagues at the School of Informatics, where might they go?
Thank you for the opportunity. It has been a pleasure to reflect a little more widely on the research!
Personally, you can find out about my work on my official homepage and on my personal site. The software systems research group (SSRG) has a website, and the University of Sk vde s English language pages are also available.
Chris: Many thanks for this interview, Simon!
For more information about the Reproducible Builds project, please see our website at
reproducible-builds.org. If you are interested in
ensuring the ongoing security of the software that underpins our civilisation
and wish to sponsor the Reproducible Builds project, please reach out to the
project by emailing
contact@reproducible-builds.org.
/usr-merge, by Helmut Grohne, et al
The work on /usr-merge continues from
May. The lengthy
discussion was condensed into a still lengthy
rewrite of DEP17 listing all known
problems and proposed mitigations. An initial
consensus call
did not resolve all questions, but we now have rough consensus on finalizing
the transition without relying on major changes to dpkg. Other questions still
have diverging opinions and some matters such as
how to not break backports
are still missing satisfying answers.
DebConf Bursary prep, by Utkarsh Gupta
DebCamp and DebConf is happening from 03rd September to 17th September in
Kochi, India, and the DebConf Bursary team is gearing up for that. After
extending the bursary deadline
(catering to the requests coming in from various people), we ve finally managed
to clock over 260 bursary requests. The team is set up and we re starting to
review the applications. The team intends to roll out the result as soon as
possible.
debci, by Helmut Grohne
As Freexian is working on deploying autopkgtests for the LTS and ELTS services,
debci and autopkgtests were improved in Debian to better deal with derivatives
(e.g. by better supporting external package signing keyrings). Other aspects
that are not deployed on ci.debian.net such as the qemu backend were also
improved. We express thanks to the relevant maintainers Antonio Terceiro, Paul
Gevers and Simon McVittie for their timely reviews and merges of our changes.
Miscellaneous contributions
Following the release of Debian 12, Rapha l Hertzog updated
tracker.debian.org to be aware of trixie. He
also pushed some fixes to distro-tracker
(the software powering tracker.debian.org) and released version 1.2.0 (since
the former release was lacking fixes to run on Debian 12 bookworm).
Following the release of Debian 12, Helmut Grohne updated
crossqa.debian.net systems. He also sent 7
patches for cross build failures and continued adapting
rebootstrap to changes
in unstable.
Santiago Ruano Rinc n started to work on how to improve the robustness of
Salsa CI s pipeline for some jobs failing frequently.
Thorsten Alteholz did security updates of cpdb-libs in Unstable and Bookworm.
A personal reflection on how I moved from my Debian home to find two new homes with Trisquel and Guix for my own ethical computing, and while doing so settled my dilemma about further Debian contributions.Debian s contributions to the free software community has been tremendous. Debian was one of the early distributions in the 1990 s that combined the GNU tools (compiler, linker, shell, editor, and a set of Unix tools) with the Linux kernel and published a free software operating system. Back then there were little guidance on how to publish free software binaries, let alone entire operating systems. There was a lack of established community processes and conflict resolution mechanisms, and lack of guiding principles to motivate the work. The community building efforts that came about in parallel with the technical work has resulted in a steady flow of releases over the years.
From the work of Richard Stallman and the Free Software Foundation (FSF) during the 1980 s and early 1990 s, there was at the time already an established definition of free software. Inspired by free software definition, and a belief that a social contract helps to build a community and resolve conflicts, Debian s social contract (DSC) with the free software community was published in 1997. The DSC included the Debian Free Software Guidelines (DFSG), which directly led to the Open Source Definition.
I was introduced to GNU/Linux through Slackware in the early 1990 s (oh boy those nights calculating XFree86 modeline s and debugging sendmail.cf) and primarily used RedHat Linux during ca 1995-2003. I switched to Debian during the Woody release cycles, when the original RedHat Linux was abandoned and Fedora launched. It was Debian s explicit community processes and infrastructure that attracted me. The slow nature of community processes also kept me using RedHat for so long: centralized and dogmatic decision processes often produce quick and effective outcomes, and in my opinion RedHat Linux was technically better than Debian ca 1995-2003. However the RedHat model was not sustainable, and resulted in the RedHat vs Fedora split. Debian catched up, and reached technical stability once its community processes had been grounded. I started participating in the Debian community around late 2006.
My interpretation of Debian s social contract is that Debian should be a distribution of works licensed 100% under a free license. The Debian community has always been inclusive towards non-free software, creating the contrib/non-free section and permitting use of the bug tracker to help resolve issues with non-free works. This is all explained in the social contract. There has always been a clear boundary between free and non-free work, and there has been a commitment that the Debian system itself would be 100% free.
The concern that RedHat Linux was not 100% free software was not critical to me at the time: I primarily (and happily) ran GNU tools on Solaris, IRIX, AIX, OS/2, Windows etc. Running GNU tools on RedHat Linux was an improvement, and I hadn t realized it was possible to get rid of all non-free software on my own primary machine. Debian realized that goal for me. I ve been a believer in that model ever since. I can use Solaris, macOS, Android etc knowing that I have the option of using a 100% free Debian.
While the inclusive approach towards non-free software invite and deserve criticism (some argue that being inclusive to non-inclusive behavior is a bad idea), I believe that Debian s approach was a successful survival technique: by being inclusive to and a compromise between free and non-free communities, Debian has been able to stay relevant and contribute to both environments. If Debian had not served and contributed to the free community, I believe free software people would have stopped contributing. If Debian had rejected non-free works completely, I don t think the successful Ubuntu distribution would have been based on Debian.
I wrote the majority of the text above back in September 2022, intending to post it as a way to argue for my proposal to maintain the status quo within Debian. I didn t post it because I felt I was saying the obvious, and that the obvious do not need to be repeated, and the rest of the post was just me going down memory lane.
The Debian project has been a sustainable producer of a 100% free OS up until Debian 11 bullseye. In the resolution on non-free firmware the community decided to leave the model that had resulted in a 100% free Debian for so long. The goal of Debian is no longer to publish a 100% free operating system, instead this was added: The Debian official media may include firmware . Indeed the Debian 12 bookworm release has confirmed that this would not only be an optional possibility. The Debian community could have published a 100% free Debian, in parallel with the non-free Debian, and still be consistent with their newly adopted policy, but chose not to. The result is that Debian s policies are not consistent with their actions. It doesn t make sense to claim that Debian is 100% free when the Debian installer contains non-free software. Actions speaks louder than words, so I m left reading the policies as well-intended prose that is no longer used for guidance, but for the peace of mind for people living in ivory towers. And to attract funding, I suppose.
So how to deal with this, on a personal level? I did not have an answer to that back in October 2022 after the vote. It wasn t clear to me that I would ever want to contribute to Debian under the new social contract that promoted non-free software. I went on vacation from any Debian work. Meanwhile Debian 12 bookworm was released, confirming my fears. I kept coming back to this text, and my only take-away was that it would be unethical for me to use Debian on my machines. Letting actions speak for themselves, I switched to PureOS on my main laptop during October, barely noticing any difference since it is based on Debian 11 bullseye. Back in December, I bought a new laptop and tried Trisquel and Guix on it, as they promise a migration path towards ppc64el that PureOS do not.
While I pondered how to approach my modest Debian contributions, I set out to learn Trisquel and gained trust in it. I migrated one Debian machine after another to Trisquel, and started to use Guix on others. Migration was easy because Trisquel is based on Ubuntu which is based on Debian. Using Guix has its challenges, but I enjoy its coherant documented environment. All of my essential self-hosted servers (VM hosts, DNS, e-mail, WWW, Nextcloud, CI/CD builders, backup etc) uses Trisquel or Guix now. I ve migrated many GitLab CI/CD rules to use Trisquel instead of Debian, to have a more ethical computing base for software development and deployment. I wish there were official Guix docker images around.
Time has passed, and when I now think about any Debian contributions, I m a little less muddled by my disappointment of the exclusion of a 100% free Debian. I realize that today I can use Debian in the same way that I use macOS, Android, RHEL or Ubuntu. And what prevents me from contributing to free software on those platforms? So I will make the occasional Debian contribution again, knowing that it will also indirectly improve Trisquel. To avoid having to install Debian, I need a development environment in Trisquel that allows me to build Debian packages. I have found a recipe for doing this:
# System commands: sudo apt-get install debhelper git-buildpackage debian-archive-keyring sudo wget -O /usr/share/debootstrap/scripts/debian-common https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/debian-common sudo wget -O /usr/share/debootstrap/scripts/sid https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/sid # Run once to create build image: DIST=sid git-pbuilder create --mirror http://deb.debian.org/debian/ --debootstrapopts "--exclude=usr-is-merged" --basepath /var/cache/pbuilder/base-sid.cow # Run in a directory with debian/ to build a package: gbp buildpackage --git-pbuilder --git-dist=sid
How to sustainably deliver a 100% free software binary distributions seems like an open question, and the challenges are not all that different compared to the 1990 s or early 2000 s. I m hoping Debian will come back to provide a 100% free platform, but my fear is that Debian will compromise even further on the free software ideals rather than the opposite. With similar arguments that were used to add the non-free firmware, Debian could compromise the free software spirit of the Linux boot process (e.g., non-free boot images signed by Debian) and media handling (e.g., web browsers and DRM), as Debian have already done with appstore-like functionality for non-free software (Python pip). To learn about other freedom issues in Debian packaging, browsing Trisquel s helper scripts may enlight you.
Debian s setback and the recent setback for RHEL-derived distributions are sad, and it will be a challenge for these communities to find internally consistent coherency going forward. I wish them the best of luck, as Debian and RHEL are important for the wider free software eco-system. Let s see how the community around Trisquel, Guix and the other FSDG-distributions evolve in the future.
The situation for free software today appears better than it was years ago regardless of Debian and RHEL s setbacks though, which is important to remember! I don t recall being able install a 100% free OS on a modern laptop and modern server as easily as I am able to do today.
Happy Hacking!
Addendum 22 July 2023: The original title of this post was Coping with non-free Debian, and there was a thread about it that included feedback on the title. I do agree that my initial title was confrontational, and I ve changed it to the more specific Coping with non-free software in Debian. I do appreciate all the fine free software that goes into Debian, and hope that this will continue and improve, although I have doubts given the opinions expressed by the majority of developers. For the philosophically inclined, it is interesting to think about what it means to say that a compilation of software is freely licensed. At what point does a compilation of software deserve the labels free vs non-free? Windows probably contains some software that is published as free software, let s say Windows is 1% free. Apple authors a lot of free software (as a tangent, Apple probably produce more free software than what Debian as an organization produces), and let s say macOS contains 20% free software. Solaris (or some still maintained derivative like OpenIndiana) is mostly freely licensed these days, isn t it? Let s say it is 80% free. Ubuntu and RHEL pushes that closer to let s say 95% free software. Debian used to be 100% but is now slightly less at maybe 99%. Trisquel and Guix are at 100%. At what point is it reasonable to call a compilation free? Does Debian deserve to be called freely licensed? Does macOS? Is it even possible to use these labels for compilations in any meaningful way? All numbers just taken from thin air. It isn t even clear how this can be measured (binary bytes? lines of code? CPU cycles? etc). The caveat about license review mistakes applies. I ignore Debian s own claims that Debian is 100% free software, which I believe is inconsistent and no longer true under any reasonable objective analysis. It was not true before the firmware vote since Debian ships with non-free blobs in the Linux kernel for example.
The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms.
Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets it is a SHA512 call on the concatenated secrets I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656 4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Moj of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work?
Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here.
Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here.
I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them.
To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What s with the key size vs block size confusion? What s with the optional parameters? What s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn t the output be different? Are there other ways to verify correctness of implementations?
As always, happy hacking!
Welcome to the April 2023 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that?
Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start.
There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.
Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. []
Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.
Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [][][], made all the Back to who is involved links italics [], and corrected the syntax of the _data/sponsors.yml file [].
Holger Levsen added his recent talk [], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [][][], reworked the People page a little [] [], as well as fixed spelling of Arch Linux [].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [] and Simon Josefsson added Trisquel GNU/Linux. []
Debian
Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.orghas been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Testing framework
The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
Holger Levsen:
Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [][][][][][]
Add the new APT repo url for Jenkins itself with a new signing key. [][]
In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. []
Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [][][]
Introduce the archived suites configuration option. []][]
Fix the KGB bot configuration to support pyyaml 6.0 as present in Debian bookworm. []
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
We've been joined by Simon (smcv) - lots of chat bouncing backwards and forwards. Laptops appearing out of backpacks suddenly being repurposed.Settling very much into a rhythm and routine.
Working with two laptops on your lap ends up being quite heavy :)
Let s reflect on some of my recent work that started with understanding Trisquel GNU/Linux, improving transparency into apt-archives, working on reproducible builds of Trisquel, strengthening verification of apt-archives with Sigstore, and finally thinking about security device threat models. A theme in all this is improving methods to have trust in machines, or generally any external entity. While I believe that everything starts by trusting something, usually something familiar and well-known, we need to deal with misuse of that trust that leads to failure to deliver what is desired and expected from the trusted entity. How can an entity behave to invite trust? Let s argue for some properties that can be quantitatively measured, with a focus on computer software and hardware:
Deterministic Behavior given a set of circumstances, it should behave the same.
Verifiability and Transparency the method (the source code) should be accessible for understanding (compare scientific method) and its binaries verifiable, i.e., it should be possible to verify that the entity actually follows the intended deterministic method (implying efforts like reproducible builds and bootstrappable builds).
Accountable the entity should behave the same for everyone, and deviation should be possible prove in a way that is hard to deny, implying efforts such as Certificate Transparency and more generic checksum logs like Sigstore and Sigsum.
Liberating the tools and documentation should be available as free software to enable you to replace the trusted entity if so desired. An entity that wants to restrict you from being able to replace the trusted entity is vulnerable to corruption and may stop acting trustworthy. This point of view reinforces that open source misses the point; it has become too common to use trademark laws to restrict re-use of open source software (e.g., firefox, chrome, rust).
Essentially, this boils down to: Trust, Verify and Hold Accountable. To put this dogma in perspective, it helps to understand that this approach may be harmful to human relationships (which could explain the social awkwardness of hackers), but it remains useful as a method to improve the design of computer systems, and a useful method to evaluate safety of computer systems. When a system fails some of the criteria above, we know we have more work to do to improve it.
How far have we come on this journey? Through earlier efforts, we are in a fairly good situation. Richard Stallman through GNU/FSF made us aware of the importance of free software, the Reproducible/Bootstrappable build projects made us aware of the importance of verifiability, and Certificate Transparency highlighted the need for accountable signature logs leading to efforts like Sigstore for software. None of these efforts would have seen the light of day unless people wrote free software and packaged them into distributions that we can use, and built hardware that we can run it on. While there certainly exists more work to be done on the software side, with the recent amazing full-source build of Guix based on a 357-byte hand-written seed, I believe that we are closing that loop on the software engineering side.
So what remains? Some inspiration for further work:
Accountable binary software distribution remains unresolved in practice, although we have some software components around (e.g., apt-sigstore and guix git authenticate). What is missing is using them for verification by default and/or to improve the signature process to use trustworthy hardware devices, and committing the signatures to transparency logs.
Trustworthy hardware to run trustworthy software on remains a challenge, and we owe FSF s Respect Your Freedom credit for raising awareness of this. Many modern devices requires non-free software to work which fails most of the criteria above and are thus inherently untrustworthy.
Verifying rebuilds of currently published binaries on trustworthy hardware is unresolved.
Completing a full-source rebuild from a small seed on trustworthy hardware remains, preferably on a platform wildly different than X86 such as Raptor s Talos II.
We need improved security hardware devices and improved established practices on how to use them. For example, while Gnuk on the FST enable a trustworthy software and hardware solution, the best process for using it that I can think of generate the cryptographic keys on a more complex device. Efforts like Tillitis are inspiring here.
Onwards and upwards, happy hacking!
Update 2023-05-03: Added the Liberating property regarding free software, instead of having it be part of the Verifiability and Transparency .
I d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles.
Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack.
Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.
The Substitution Attack
Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.
Terminology
The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don t care about that distinction, we just care if a particular device has a malicious component in it or not. I ll use terms like security device , device , hardware key , security co-processor etc interchangeably.
From an engineering point of view, malicious here includes unintentional behavior such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug.
Don t attribute to malice what can be adequately explained by stupidity, but don t na vely attribute to stupidity what may be deniable malicious.
What is some period of time ?
Some period of time can be any length of time: seconds, minutes, days, weeks, etc.
It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase.
Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.
What is harmful consequences ?
Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:
Attacker learns any secret keys stored on a Good Key.
Attacker causes user to trust a public generated by a Bad Key.
Attacker is able to sign something using a Good Key.
Attacker learns the PIN code used to unlock a Good Key.
Attacker learns data that is decrypted by a Good Key.
Thin vs Deep solutions
One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before.
A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices.
In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices.
Upon scrutiny, this solution is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks.
The model of the thin solution is similar to the old approach to network firewalls: have a filtering firewall that only lets through intended traffic, and then run completely insecure protocols internally such as telnet.
The networking world has evolved, and now we have defense in depth: even within strongly firewall ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking.
I m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.
Example: Device authentication using PIN codes
To see how this threat model can be applied to reason about security key designs, let s consider a common design.
Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct.
Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device.
This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the kdf_do=required option.
Example: Onboard non-predictable key-generation
Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key.
Let s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling.
This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.
Example: Non-predictable randomness-generation
Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is.
Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not.
This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.
Case-Study: Tillitis
I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:
Let s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:
Where weakprng is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.
Conclusion
Remember that it is impossible to fully protect against this attack, that s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let s assume that there is a non-negligible chance that they haven t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that s why the design mantra of mitigate harmful consequences is crucial as a takeaway from this. Let s improve the design of security devices that further the security of its users!
In a previous blog post I described the use of virtual
postings to track accidental personal/family expenses. I've always been
uncomfortable with that, and in hledger 1yr I outlined a potential scheme
for finally addressing the virtual posting problem.
separate journals
My outline built on top of continuing to maintain both personal and family
financial data in the same place, but I've decided that this can't work,
because the different "directions" (or signs) of accidental transactions
originating from either the family or personal side can't be addressed with any
kind of alternate view on the same data.
To illustrate with an example.
A negative balance in family:liabilities:jon means "family owes jon". A
coffee bought by mistake on the family credit card will have a negative
posting on the credit card, and thus a positive one on the liabilities
account. ("jon owes family"). That's fine.
But what about when I buy family stuff on a personal card? The other side of
of the transaction is also going to have a positive sign, so it can't be
posted to family:liabilities:jon: it would have to go to somewhere else,
like jon:liabilities:family. Now I have two accounts which track versions
of the same thing, and they cannot be combined with a simple transaction
since they're looking at the same value from opposite directions (and signs).
Back when I first described the problem I was using
a single journal file for all my transactions. After moving to lots of separate
journal files (in hledger 1yr), it's become clearer to me that I don't
need to maintain the Family and Personal data together, at all: they can be
entirely separate journals.
getting data between journals
When I moved to a new set of ledger files for 2023, I needed to carry forward
the balances from 2022 in the form of "opening balance" transactions. This was
achieved by a report on the 2022 data, exported as CSV, and imported into the
2023 data (all following the scheme outlined by fully-fledged hledger.))
Separate Personal and Family journals need some information from each other, and I
can achieve that in the same way as for opening balances: with an export of
the relevant transactions as CSV, then imported on the other side. HLedger's
CSV import system is flexible enough that we can effectively invert the sign
of liabilities, addressing the problem above.
Worked example
We start with an accidental coffee purchased on the family card (and so this
belongs to the Family ledger)
I've encoded the expense category that the Personal ledger will be interested
in (the last bit, expenses:coffee) as a sub-account of the liabilities
category that the Family ledger is interested in1 (the first bit,
liabilities:jon). When viewed on the Family side, the expense category is not
interesting, and we can hide it with HLedger's alias feature2:
This is then converted into a journal file by hledger import. The rules file
for the import is very simple: the fields date, description, account1
and amount are taken as-is; account2 is hard-coded to liabilities:family.
The resulting transaction looks like
Before this journal is included by the main one, we have to adjust the expense
account, to remove the liabilities:jon: prefix. The import rules can't do
this3 , so we use another journal file as a go-between with another alias
rule:
alias /^liabilities.jon:/ =
This results, finally, in the following transaction in the Personal ledger:
avoiding double-counting
There's one set of transactions that we don't want to export across this divide,
and that's because they're already there: any time I transfer money from myself
to the family accounts (or vice versa) to address the accrued debt, the transaction
is visible from both my family and personal statements. To avoid exporting these
and double-counting them, I make sure those transactions don't post to an account
matching the pattern used in the hledger reg report. That's what the trailing
colon is for: It ensures I only export transactions which are to a sub-account
of liabilities:jon, and not to the root account liabilities:jon itself:
which is where I put the repayment transactions. I could instead use a more
explicit sub-account like liabilities:jon:repayments or similar, since the
trailing colon is quite subtle, but this works for me.
Wrap up
I've been really on the fence as to whether the complexity of this scheme is
worth it to avoid the virtual postings. The previous scheme was much simpler.
I have definitely made some mistakes with it, which didn't get caught by the
double-entry rules that virtual postings ignore, but they're for small sums
of money anyway.
On the other hand, a lot of the "machinery" of this already existed for getting
opening balances between calendar years, and the gory details are written down
and hidden inside the Makefile. I also expect that I will continue to see
advantages in having Family and Personal entirely separate, as they can each
develop and adapt to their own needs without having to consider the other side
of things every time.
It's a running experiment, and time will tell if it's a good idea.
This scheme was originally suggested to me by Pranesh on Twitter
(described in dues), but I discounted it at the time
because of the exact arrangement they suggested, not realising
the broader idea might work.
I've hand-waved one problem with using hledger aliases here. If
we use them as described, to hide the Personal expense details, we need
them to not be applied when performing the CSV-generating report.
Therefore, in practise I have them in a front-most family/2023.journal
file, which imports the data from another family/2023-back.journal,
and the CSV export is performed on the backing journal with the data
and not the alias.
HLedger import rules can't manipulate the fields from the CSV a great
deal, but one change I proposed and started hacking on would allow for
this: to expose Regexp match-groups as interpolatable tokens:
https://github.com/simonmichael/hledger/issues/2009.
As suggested in my initial announcement of apt-sigstore my plan was to look into stronger uses of Sigstore than rekor, and I m now happy to announce that the apt-cosign plugin has been added to apt-sigstore and the operational project debdistcanary is publishing cosign-statements about the InRelease file published by the following distributions: Trisquel GNU/Linux, PureOS, Gnuinos, Ubuntu, Debian and Devuan.
Summarizing the commands that you need to run as root to experience the great new world:
Then run your usual apt-get update and look in the syslog to debug things.
This is the kind of work that gets done while waiting for the build machines to attempt to reproducibly build PureOS. Unfortunately, the results is that a meager 16% of the 765 added/modifed packages are reproducible by me. There is some infrastructure work to be done to improve things: we should use sbuild for example. The build infrastructure should produce signed statements for each package it builds: One statement saying that it attempted to reproducible build a particular binary package (thus generated some build logs and diffoscope-output for auditing), and one statements saying that it actually was able to reproduce a package. Verifying such claims during apt-get install or possibly dpkg -i is a logical next step.
There is some code cleanups and release work to be done now. Which distribution will be the first apt-based distribution that includes native support for Sigstore? Let s see.
Sigstore is not the only relevant transparency log around, and I ve been trying to learn a bit about Sigsum to be able to support it as well. The more improved confidence about system security, the merrier!
Building on my work to rebuild Trisquel GNU/Linux 11.0 aramo, it felt simple to generalize the tooling to any two apt-repository pairs and I ve created debdistreproduce as a template-project for doing this through the infrastructure of GitLab CI/CD and meanwhile even set up my own gitlab-runner on spare hardware. I ve brought over reproduce/trisquel to using debdistreproduce as well, and archived the old reproduce-trisquel project.
After fixing some quirks, building Devuan GNU+Linux 4.0 Chimaera was fairly quick since they do not modify that many packages, and I m now able to reproduce 46% of the packages that Devuan Chimaera add/modify on amd64. I have more work in progress here (hint: reproduce/pureos), but PureOS is considerably larger than both Trisquel and Devuan together. I m not sure how interested Devuan or PureOS are in reproducible builds though.
Reflecting on this work made me realize that while the natural thing to do here was to differentiate two different apt-based distributions, I have realized the same way I did for debdistdiff that it would also be interesting to compare, say, Debian bookworm from Debian unstable, especially now that they should be fairly close together. My tooling should support that too. However, to really provide any benefit from the more complete existing reproducible testing of Debian, some further benefit from doing that would be useful and I can t articulate one right now.
One ultimate goal with my effort is to improve trust in apt-repositories, and combining transparency-style protection a la apt-sigstore with third-party validated reproducible builds may indeed be one such use-case that would benefit the wider community of apt-repositories. Imagine having your system not install any package unless it can verify it against a third-party reproducible build organization that commits their results in a tamper-proof transparency ledger. But I m now on repeat here, so will stop.
Do you want your apt-get update to only ever use files whose hash checksum have been recorded in the globally immutable tamper-resistance ledger rekor provided by the Sigstore project? Well I thought you d never ask, but now you can, thanks to my new projects apt-verify and apt-sigstore. I have not done proper stable releases yet, so this is work in progress. To try it out, adapt to the modern era of running random stuff from the Internet as root, and run the following commands. Use a container or virtual machine if you have trust issues.
If the stars are aligned (and the puppet projects of debdistget and debdistcanary have ran their GitLab CI/CD pipeline recently enough) you will see a successful output from apt-get update and your syslog will contain debug logs showing the entries from the rekor log for the release index files that you downloaded. See sample outputs in the README.
If you get tired of it, disabling is easy:
chmod -x /etc/apt/verify.d/apt-rekor
Our project currently supports Trisquel GNU/Linux 10 (nabia) & 11 (aramo), PureOS 10 (byzantium), Gnuinos chimaera, Ubuntu 20.04 (focal) & 22.04 (jammy), Debian 10 (buster) & 11 (bullseye), and Devuan GNU+Linux 4.0 (chimaera). Others can be supported to, please open an issue about it, although my focus is on FSDG-compliant distributions and their upstreams.
This is a continuation of my previous work on apt-canary. I have realized that it was better to separate out the generic part of apt-canary into my new project apt-verify that offers a plugin-based method, and then rewrote apt-canary to be one such plugin. Then apt-sigstore s apt-rekor was my second plugin for apt-verify.
Due to the design of things, and some current limitations, Ubuntu is the least stable since they push out new signed InRelease files frequently (mostly due to their use of Phased-Update-Percentage) and debdistget and debdistcanary CI/CD runs have a hard time keeping up. If you have insight on how to improve this, please comment me in the issue tracking the race condition.
There are limitations of what additional safety a rekor-based solution actually provides, but I expect that to improve as I get a cosign-based approach up and running. Currently apt-rekor mostly make targeted attacks less deniable. With a cosign-based approach, we could design things such that your machine only downloads updates when they have been publicly archived in an immutable fashion, or submitted for validation by a third-party such as my reproducible build setup for Trisquel GNU/Linux aramo.
What do you think? Happy Hacking!
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
tl;dr: go to reproduce-trisquel.
When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices.
The design of my setup to build Trisquel reproducible are as follows.
The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project s README.md based on the build logs and diffoscope outputs that stored in the git repository.
Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.
I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance.
Current limitations and ideas on further work (most are filed as project issues) include:
We don t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn t let that stand in the way of a good headline.
The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
Having at least one other CPU architecture would be nice.
Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I m also looking at Devuan because of Gnuinos.
The elephant in the room is how reproducible Ubuntu is in the first place.
Happy Easter Hacking!
Update 2023-04-17: The original project reproduce-trisquel that was announced here has been archived and replaced with two projects, one generic debdistreproduce and one with results for Trisquel: reproduce/trisquel .
I ve used hardware-backed OpenPGP keys since 2006 when I imported newly generated rsa1024 subkeys to a FSFE Fellowship card. This worked well for several years, and I recall buying more ZeitControl cards for multi-machine usage and backup purposes. As a side note, I recall being unsatisfied with the weak 1024-bit RSA subkeys at the time my primary key was a somewhat stronger 1280-bit RSA key created back in 2002 but OpenPGP cards at the time didn t support more than 1024 bit RSA, and were (and still often are) also limited to power-of-two RSA key sizes which I dislike.
I had my master key on disk with a strong password for a while, mostly to refresh expiration time of the subkeys and to sign other s OpenPGP keys. At some point I stopped carrying around encrypted copies of my master key. That was my main setup when I migrated to a new stronger RSA 3744 bit key with rsa2048 subkeys on a YubiKey NEO back in 2014. At that point, signing other s OpenPGP keys was a rare enough occurrence that I settled with bringing out my offline machine to perform this operation, transferring the public key to sign on USB sticks. In 2019 I re-evaluated my OpenPGP setup and ended up creating a offline Ed25519 key with subkeys on a FST-01G running Gnuk. My approach for signing other s OpenPGP keys were still to bring out my offline machine and sign things using the master secret using USB sticks for storage and transport. Which meant I almost never did that, because it took too much effort. So my 2019-era Ed25519 key still only has a handful of signatures on it, since I had essentially stopped signing other s keys which is the traditional way of getting signatures in return.
None of this caused any critical problem for me because I continued to use my old 2014-era RSA3744 key in parallel with my new 2019-era Ed25519 key, since too many systems didn t handle Ed25519. However, during 2022 this changed, and the only remaining environment that I still used my RSA3744 key for was in Debian and they require OpenPGP signatures on the new key to allow it to replace an older key. I was in denial about this sub-optimal solution during 2022 and endured its practical consequences, having to use the YubiKey NEO (which I had replaced with a permanently inserted YubiKey Nano at some point) for Debian-related purposes alone.
In December 2022 I bought a new laptop and setup a FST-01SZ with my Ed25519 key, and while I have taken a vacation from Debian, I continue to extend the expiration period on the old RSA3744-key in case I will ever have to use it again, so the overall OpenPGP setup was still sub-optimal. Having two valid OpenPGP keys at the same time causes people to use both for email encryption (leading me to have to use both devices), and the WKD Key Discovery protocol doesn t like two valid keys either. At FOSDEM 23 I ran into Andre Heinecke at GnuPG and I couldn t help complain about how complex and unsatisfying all OpenPGP-related matters were, and he mildly ignored my rant and asked why I didn t put the master key on another smartcard. The comment sunk in when I came home, and recently I connected all the dots and this post is a summary of what I did to move my offline OpenPGP master key to a Nitrokey Start.
First a word about device choice, I still prefer to use hardware devices that are as compatible with free software as possible, but the FST-01G or FST-01SZ are no longer easily available for purchase. I got a comment about Nitrokey start in my last post, and had two of them available to experiment with. There are things to dislike with the Nitrokey Start compared to the YubiKey (e.g., relative insecure chip architecture, the bulkier form factor and lack of FIDO/U2F/OATH support) but as far as I know there is no more widely available owner-controlled device that is manufactured for an intended purpose of implementing an OpenPGP card. Thus it hits the sweet spot for me.
The first step is to run latest firmware on the Nitrokey Start for bug-fixes and important OpenSSH 9.0 compatibility and there are reproducible-built firmware published that you can install using pynitrokey. I run Trisquel 11 aramo on my laptop, which does not include the Python Pip package (likely because it promotes installing non-free software) so that was a slight complication. Building the firmware locally may have worked, and I would like to do that eventually to confirm the published firmware, however to save time I settled with installing the Ubuntu 22.04 packages on my machine:
$ sha256sum python3-pip*
ded6b3867a4a4cbaff0940cab366975d6aeecc76b9f2d2efa3deceb062668b1c python3-pip_22.0.2+dfsg-1ubuntu0.2_all.deb
e1561575130c41dc3309023a345de337e84b4b04c21c74db57f599e267114325 python3-pip-whl_22.0.2+dfsg-1ubuntu0.2_all.deb
$ doas dpkg -i python3-pip*
...
$ doas apt install -f
...
$
Installing pynitrokey downloaded a bunch of dependencies, and it would be nice to audit the license and security vulnerabilities for each of them. (Verbose output below slightly redacted.)
jas@kaka:~$ pip3 install --user pynitrokey
Collecting pynitrokey
Downloading pynitrokey-0.4.34-py3-none-any.whl (572 kB)
Collecting frozendict~=2.3.4
Downloading frozendict-2.3.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (113 kB)
Requirement already satisfied: click<9,>=8.0.0 in /usr/lib/python3/dist-packages (from pynitrokey) (8.0.3)
Collecting ecdsa
Downloading ecdsa-0.18.0-py2.py3-none-any.whl (142 kB)
Collecting python-dateutil~=2.7.0
Downloading python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
Collecting fido2<2,>=1.1.0
Downloading fido2-1.1.0-py3-none-any.whl (201 kB)
Collecting tlv8
Downloading tlv8-0.10.0.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: certifi>=14.5.14 in /usr/lib/python3/dist-packages (from pynitrokey) (2020.6.20)
Requirement already satisfied: pyusb in /usr/lib/python3/dist-packages (from pynitrokey) (1.2.1.post1)
Collecting urllib3~=1.26.7
Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting spsdk<1.8.0,>=1.7.0
Downloading spsdk-1.7.1-py3-none-any.whl (684 kB)
Collecting typing_extensions~=4.3.0
Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: cryptography<37,>=3.4.4 in /usr/lib/python3/dist-packages (from pynitrokey) (3.4.8)
Collecting intelhex
Downloading intelhex-2.3.0-py2.py3-none-any.whl (50 kB)
Collecting nkdfu
Downloading nkdfu-0.2-py3-none-any.whl (16 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pynitrokey) (2.25.1)
Collecting tqdm
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting nrfutil<7,>=6.1.4
Downloading nrfutil-6.1.7.tar.gz (845 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (from pynitrokey) (1.15.0)
Collecting crcmod
Downloading crcmod-1.7.tar.gz (89 kB)
Preparing metadata (setup.py) ... done
Collecting libusb1==1.9.3
Downloading libusb1-1.9.3-py3-none-any.whl (60 kB)
Collecting pc_ble_driver_py>=0.16.4
Downloading pc_ble_driver_py-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting piccata
Downloading piccata-2.0.3-py3-none-any.whl (21 kB)
Collecting protobuf<4.0.0,>=3.17.3
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Collecting pyserial
Downloading pyserial-3.5-py2.py3-none-any.whl (90 kB)
Collecting pyspinel>=1.0.0a3
Downloading pyspinel-1.0.3.tar.gz (58 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from nrfutil<7,>=6.1.4->pynitrokey) (5.4.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil~=2.7.0->pynitrokey) (1.16.0)
Collecting pylink-square<0.11.9,>=0.8.2
Downloading pylink_square-0.11.1-py2.py3-none-any.whl (78 kB)
Collecting jinja2<3.1,>=2.11
Downloading Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting bincopy<17.11,>=17.10.2
Downloading bincopy-17.10.3-py3-none-any.whl (17 kB)
Collecting fastjsonschema>=2.15.1
Downloading fastjsonschema-2.16.3-py3-none-any.whl (23 kB)
Collecting astunparse<2,>=1.6
Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting oscrypto~=1.2
Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
Collecting deepmerge==0.3.0
Downloading deepmerge-0.3.0-py2.py3-none-any.whl (7.6 kB)
Collecting pyocd<=0.31.0,>=0.28.3
Downloading pyocd-0.31.0-py3-none-any.whl (12.5 MB)
Collecting click-option-group<0.6,>=0.3.0
Downloading click_option_group-0.5.5-py3-none-any.whl (12 kB)
Collecting pycryptodome<4,>=3.9.3
Downloading pycryptodome-3.17-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Collecting pyocd-pemicro<1.2.0,>=1.1.1
Downloading pyocd_pemicro-1.1.5-py3-none-any.whl (9.0 kB)
Requirement already satisfied: colorama<1,>=0.4.4 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (0.4.4)
Collecting commentjson<1,>=0.9
Downloading commentjson-0.9.0.tar.gz (8.7 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: asn1crypto<2,>=1.2 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.0)
Collecting pypemicro<0.2.0,>=0.1.9
Downloading pypemicro-0.1.11-py3-none-any.whl (5.7 MB)
Collecting libusbsio>=2.1.11
Downloading libusbsio-2.1.11-py3-none-any.whl (247 kB)
Collecting sly==0.4
Downloading sly-0.4.tar.gz (60 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml<0.18.0,>=0.17
Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting cmsis-pack-manager<0.3.0
Downloading cmsis_pack_manager-0.2.10-py2.py3-none-manylinux1_x86_64.whl (25.1 MB)
Collecting click-command-tree==1.1.0
Downloading click_command_tree-1.1.0-py3-none-any.whl (3.6 kB)
Requirement already satisfied: bitstring<3.2,>=3.1 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (3.1.7)
Collecting hexdump~=3.3
Downloading hexdump-3.3.zip (12 kB)
Preparing metadata (setup.py) ... done
Collecting fire
Downloading fire-0.5.0.tar.gz (88 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse<2,>=1.6->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.37.1)
Collecting humanfriendly
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting argparse-addons>=0.4.0
Downloading argparse_addons-0.12.0-py3-none-any.whl (3.3 kB)
Collecting pyelftools
Downloading pyelftools-0.29-py2.py3-none-any.whl (174 kB)
Collecting milksnake>=0.1.2
Downloading milksnake-0.1.5-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: appdirs>=1.4 in /usr/lib/python3/dist-packages (from cmsis-pack-manager<0.3.0->spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.4)
Collecting lark-parser<0.8.0,>=0.7.1
Downloading lark-parser-0.7.8.tar.gz (276 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2<3.1,>=2.11->spsdk<1.8.0,>=1.7.0->pynitrokey) (2.0.1)
Collecting asn1crypto<2,>=1.2
Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
Collecting wrapt
Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)
Collecting future
Downloading future-0.18.3.tar.gz (840 kB)
Preparing metadata (setup.py) ... done
Collecting psutil>=5.2.2
Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting capstone<5.0,>=4.0
Downloading capstone-4.0.2-py2.py3-none-manylinux1_x86_64.whl (2.1 MB)
Collecting naturalsort<2.0,>=1.5
Downloading naturalsort-1.5.1.tar.gz (7.4 kB)
Preparing metadata (setup.py) ... done
Collecting prettytable<3.0,>=2.0
Downloading prettytable-2.5.0-py3-none-any.whl (24 kB)
Collecting intervaltree<4.0,>=3.0.2
Downloading intervaltree-3.1.0.tar.gz (32 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml.clib>=0.2.6
Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)
Collecting termcolor
Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting sortedcontainers<3.0,>=2.0
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from prettytable<3.0,>=2.0->pyocd<=0.31.0,>=0.28.3->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.2.5)
Building wheels for collected packages: nrfutil, crcmod, sly, tlv8, commentjson, hexdump, pyspinel, fire, intervaltree, lark-parser, naturalsort, future
Building wheel for nrfutil (setup.py) ... done
Created wheel for nrfutil: filename=nrfutil-6.1.7-py3-none-any.whl size=898520 sha256=de6f8803f51d6c26d24dc7df6292064a468ff3f389d73370433fde5582b84a10
Stored in directory: /home/jas/.cache/pip/wheels/39/2b/9b/98ab2dd716da746290e6728bdb557b14c1c9a54cb9ed86e13b
Building wheel for crcmod (setup.py) ... done
Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31422 sha256=5149ac56fcbfa0606760eef5220fcedc66be560adf68cf38c604af3ad0e4a8b0
Stored in directory: /home/jas/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
Building wheel for sly (setup.py) ... done
Created wheel for sly: filename=sly-0.4-py3-none-any.whl size=27352 sha256=f614e413918de45c73d1e9a8dca61ca07dc760d9740553400efc234c891f7fde
Stored in directory: /home/jas/.cache/pip/wheels/a2/23/4a/6a84282a0d2c29f003012dc565b3126e427972e8b8157ea51f
Building wheel for tlv8 (setup.py) ... done
Created wheel for tlv8: filename=tlv8-0.10.0-py3-none-any.whl size=11266 sha256=3ec8b3c45977a3addbc66b7b99e1d81b146607c3a269502b9b5651900a0e2d08
Stored in directory: /home/jas/.cache/pip/wheels/e9/35/86/66a473cc2abb0c7f21ed39c30a3b2219b16bd2cdb4b33cfc2c
Building wheel for commentjson (setup.py) ... done
Created wheel for commentjson: filename=commentjson-0.9.0-py3-none-any.whl size=12092 sha256=28b6413132d6d7798a18cf8c76885dc69f676ea763ffcb08775a3c2c43444f4a
Stored in directory: /home/jas/.cache/pip/wheels/7d/90/23/6358a234ca5b4ec0866d447079b97fedf9883387d1d7d074e5
Building wheel for hexdump (setup.py) ... done
Created wheel for hexdump: filename=hexdump-3.3-py3-none-any.whl size=8913 sha256=79dfadd42edbc9acaeac1987464f2df4053784fff18b96408c1309b74fd09f50
Stored in directory: /home/jas/.cache/pip/wheels/26/28/f7/f47d7ecd9ae44c4457e72c8bb617ef18ab332ee2b2a1047e87
Building wheel for pyspinel (setup.py) ... done
Created wheel for pyspinel: filename=pyspinel-1.0.3-py3-none-any.whl size=65033 sha256=01dc27f81f28b4830a0cf2336dc737ef309a1287fcf33f57a8a4c5bed3b5f0a6
Stored in directory: /home/jas/.cache/pip/wheels/95/ec/4b/6e3e2ee18e7292d26a65659f75d07411a6e69158bb05507590
Building wheel for fire (setup.py) ... done
Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116951 sha256=3d288585478c91a6914629eb739ea789828eb2d0267febc7c5390cb24ba153e8
Stored in directory: /home/jas/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
Building wheel for intervaltree (setup.py) ... done
Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26119 sha256=5ff1def22ba883af25c90d90ef7c6518496fcd47dd2cbc53a57ec04cd60dc21d
Stored in directory: /home/jas/.cache/pip/wheels/fa/80/8c/43488a924a046b733b64de3fac99252674c892a4c3801c0a61
Building wheel for lark-parser (setup.py) ... done
Created wheel for lark-parser: filename=lark_parser-0.7.8-py2.py3-none-any.whl size=62527 sha256=3d2ec1d0f926fc2688d40777f7ef93c9986f874169132b1af590b6afc038f4be
Stored in directory: /home/jas/.cache/pip/wheels/29/30/94/33e8b58318aa05cb1842b365843036e0280af5983abb966b83
Building wheel for naturalsort (setup.py) ... done
Created wheel for naturalsort: filename=naturalsort-1.5.1-py3-none-any.whl size=7526 sha256=bdecac4a49f2416924548cae6c124c85d5333e9e61c563232678ed182969d453
Stored in directory: /home/jas/.cache/pip/wheels/a6/8e/c9/98cfa614fff2979b457fa2d9ad45ec85fa417e7e3e2e43be51
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492037 sha256=57a01e68feca2b5563f5f624141267f399082d2f05f55886f71b5d6e6cf2b02c
Stored in directory: /home/jas/.cache/pip/wheels/5e/a9/47/f118e66afd12240e4662752cc22cefae5d97275623aa8ef57d
Successfully built nrfutil crcmod sly tlv8 commentjson hexdump pyspinel fire intervaltree lark-parser naturalsort future
Installing collected packages: tlv8, sortedcontainers, sly, pyserial, pyelftools, piccata, naturalsort, libusb1, lark-parser, intelhex, hexdump, fastjsonschema, crcmod, asn1crypto, wrapt, urllib3, typing_extensions, tqdm, termcolor, ruamel.yaml.clib, python-dateutil, pyspinel, pypemicro, pycryptodome, psutil, protobuf, prettytable, oscrypto, milksnake, libusbsio, jinja2, intervaltree, humanfriendly, future, frozendict, fido2, ecdsa, deepmerge, commentjson, click-option-group, click-command-tree, capstone, astunparse, argparse-addons, ruamel.yaml, pyocd-pemicro, pylink-square, pc_ble_driver_py, fire, cmsis-pack-manager, bincopy, pyocd, nrfutil, nkdfu, spsdk, pynitrokey
WARNING: The script nitropy is installed in '/home/jas/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argparse-addons-0.12.0 asn1crypto-1.5.1 astunparse-1.6.3 bincopy-17.10.3 capstone-4.0.2 click-command-tree-1.1.0 click-option-group-0.5.5 cmsis-pack-manager-0.2.10 commentjson-0.9.0 crcmod-1.7 deepmerge-0.3.0 ecdsa-0.18.0 fastjsonschema-2.16.3 fido2-1.1.0 fire-0.5.0 frozendict-2.3.5 future-0.18.3 hexdump-3.3 humanfriendly-10.0 intelhex-2.3.0 intervaltree-3.1.0 jinja2-3.0.3 lark-parser-0.7.8 libusb1-1.9.3 libusbsio-2.1.11 milksnake-0.1.5 naturalsort-1.5.1 nkdfu-0.2 nrfutil-6.1.7 oscrypto-1.3.0 pc_ble_driver_py-0.17.0 piccata-2.0.3 prettytable-2.5.0 protobuf-3.20.3 psutil-5.9.4 pycryptodome-3.17 pyelftools-0.29 pylink-square-0.11.1 pynitrokey-0.4.34 pyocd-0.31.0 pyocd-pemicro-1.1.5 pypemicro-0.1.11 pyserial-3.5 pyspinel-1.0.3 python-dateutil-2.7.5 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 sly-0.4 sortedcontainers-2.4.0 spsdk-1.7.1 termcolor-2.2.0 tlv8-0.10.0 tqdm-4.65.0 typing_extensions-4.3.0 urllib3-1.26.15 wrapt-1.15.0
jas@kaka:~$
Then upgrading the device worked remarkable well, although I wish that the tool would have printed URLs and checksums for the firmware files to allow easy confirmation.
jas@kaka:~$ PATH=$PATH:/home/jas/.local/bin
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.15-5D271572: Nitrokey Nitrokey Start (RTM.12.1-RC2-modified)
jas@kaka:~$ nitropy start update
Command line tool to interact with Nitrokey devices 0.4.34
Nitrokey Start firmware update tool
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
System: Linux, is_linux: True
Python: 3.10.6
Saving run log to: /tmp/nitropy.log.gc5753a8
Admin PIN:
Firmware data to be used:
- FirmwareType.REGNUAL: 4408, hash: ...b'72a30389' valid (from ...built/RTM.13/regnual.bin)
- FirmwareType.GNUK: 129024, hash: ...b'25a4289b' valid (from ...prebuilt/RTM.13/gnuk.bin)
Currently connected device strings:
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.15-5D271572
Revision: RTM.12.1-RC2-modified
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
initial device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.15-5D271572', 'Revision': 'RTM.12.1-RC2-modified', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
Please note:
- Latest firmware available is:
RTM.13 (published: 2022-12-08T10:59:11Z)
- provided firmware: None
- all data will be removed from the device!
- do not interrupt update process - the device may not run properly!
- the process should not take more than 1 minute
Do you want to continue? [yes/no]: yes
...
Starting bootloader upload procedure
Device: Nitrokey Start FSIJ-1.2.15-5D271572
Connected to the device
Running update!
Do NOT remove the device from the USB slot, until further notice
Downloading flash upgrade program...
Executing flash upgrade...
Waiting for device to appear:
Wait 20 seconds.....
Downloading the program
Protecting device
Finish flashing
Resetting device
Update procedure finished. Device could be removed from USB slot.
Currently connected device strings (after upgrade):
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.19-5D271572
Revision: RTM.13
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
device can now be safely removed from the USB slot
final device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.19-5D271572', 'Revision': 'RTM.13', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
finishing session 2023-03-16 21:49:07.371291
Log saved to: /tmp/nitropy.log.gc5753a8
jas@kaka:~$
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.19-5D271572: Nitrokey Nitrokey Start (RTM.13)
jas@kaka:~$
Before importing the master key to this device, it should be configured. Note the commands in the beginning to make sure scdaemon/pcscd is not running because they may have cached state from earlier cards. Change PIN code as you like after this, my experience with Gnuk was that the Admin PIN had to be changed first, then you import the key, and then you change the PIN.
jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
OK
ERR 67125247 Slut p fil <GPG Agent>
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas 11651 0.0 0.0 3468 1672 pts/0 R+ 21:54 0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......:
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE5D2715720000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q
gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
If the key generation does not succeed, please check the
documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card>
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
jas@kaka:~$
Once setup, bring out your offline machine and boot it and mount your USB stick with the offline key. The paths below will be different, and this is using a somewhat unorthodox approach of working with fresh GnuPG configuration paths that I chose for the USB stick.
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ cp -a gnupghome-backup-masterkey gnupghome-import-nitrokey-5D271572
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ gpg --homedir $PWD/gnupghome-import-nitrokey-5D271572 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Secret key is available.
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Really move the primary key? (y/N) y
Please select where to store the key:
(1) Signature key
(3) Authentication key
Your selection? 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg>
Save changes? (y/N) y
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$
At this point it is useful to confirm that the Nitrokey has the master key available and that is possible to sign statements with it, back on your regular machine:
jas@kaka:~$ gpg --card-status
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 1
KDF setting ......: on
Signature key ....: B1D2 BD13 75BE CB78 4CF4 F8C4 D73C F638 C53C 06BE
created ....: 2019-03-20 23:37:24
Encryption key....: [none]
Authentication key: [none]
General key info..: pub ed25519/D73CF638C53C06BE 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec> ed25519/D73CF638C53C06BE created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 5D271572
ssb> ed25519/80260EE8A9B92B2B created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> ed25519/51722B08FE4745A2 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> cv25519/02923D7EE76EBD60 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Thu Mar 16 22:11:02 2023 CET
gpg: using EDDSA key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$
Finally to retrieve and sign a key, for example Andre Heinecke s that I could confirm the OpenPGP key identifier from his business card.
jas@kaka:~$ gpg --locate-external-keys aheinecke@gnupg.com
gpg: key 1FDF723CF462B6B1: public key "Andre Heinecke <aheinecke@gnupg.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 2 signed: 7 trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1 valid: 7 signed: 64 trust: 7-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2023-05-26
pub rsa3072 2015-12-08 [SC] [expires: 2025-12-05]
94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1
uid [ unknown] Andre Heinecke <aheinecke@gnupg.com>
sub ed25519 2017-02-13 [S]
sub ed25519 2017-02-13 [A]
sub rsa3072 2015-12-08 [E] [expires: 2025-12-05]
sub rsa3072 2015-12-08 [A] [expires: 2025-12-05]
jas@kaka:~$ gpg --edit-key "94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1"
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
sub ed25519/2978E9D40CBABA5C
created: 2017-02-13 expires: never usage: S
sub ed25519/DC74D901C8E2DD47
created: 2017-02-13 expires: never usage: A
The following key was revoked on 2017-02-23 by RSA key 1FDF723CF462B6B1 Andre Heinecke <aheinecke@gnupg.com>
sub cv25519/1FFE3151683260AB
created: 2017-02-13 revoked: 2017-02-23 usage: E
sub rsa3072/8CC999BDAA45C71F
created: 2015-12-08 expires: 2025-12-05 usage: E
sub rsa3072/6304A4B539CE444A
created: 2015-12-08 expires: 2025-12-05 usage: A
[ unknown] (1). Andre Heinecke <aheinecke@gnupg.com>
gpg> sign
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
Primary key fingerprint: 94A5 C9A0 3C2F E5CA 3B09 5D8E 1FDF 723C F462 B6B1
Andre Heinecke <aheinecke@gnupg.com>
This key is due to expire on 2025-12-05.
Are you sure that you want to sign this key with your
key "Simon Josefsson <simon@josefsson.org>" (D73CF638C53C06BE)
Really sign? (y/N) y
gpg> quit
Save changes? (y/N) y
jas@kaka:~$
This is on my day-to-day machine, using the NitroKey Start with the offline key. No need to boot the old offline machine just to sign keys or extend expiry anymore! At FOSDEM 23 I managed to get at least one DD signature on my new key, and the Debian keyring maintainers accepted my Ed25519 key. Hopefully I can now finally let my 2014-era RSA3744 key expire in 2023-09-19 and not extend it any further. This should finish my transition to a simpler OpenPGP key setup, yay!
It's been a year since I started exploring HLedger, and I'm still
going. The rollover to 2023 was an opportunity to revisit my approach.
Some time ago I stumbled across Dmitry Astapov's HLedger notes (fully-fledged
hledger, which I briefly
mentioned in eventual consistency) and decided to adopt some of its ideas.
new year, new journal
First up, Astapov encourages starting a new journal file for a new calendar
year. I do this for other, accounting-adjacent files as a matter of course,
and I did it for my GNUCash files prior to adopting HLedger. But the reason
for those is a general suspicion that a simple mistake with those softwares
could irrevocably corrupt my data. I'm much more confident with HLedger, so
rolling over at years end isn't necessary for that. But there are other
advantages. A quick obvious one is you can get rid of old accounts (such as
expense accounts tied to a particular project, now completed).
one journal per import
In the first year, I periodically imported account data via CSV exports
of transactions and HLedger's (excellent) CSV import system. I imported
all the transactions, once each, into a single, large journal file.
Astapov instead advocates for creating a separate journal
for each CSV that you wish to import, and keep around the CSV, leaving you
with a 1:1 mapping of CSV:journal. Then use HLedger's "include" mechanism to
pull them all into the main journal.
With the former approach, where the CSV data was imported precisely, once, it
was only exposed to your import rules once. The workflow ended up being:
import transactions; notice some that you could have matched with import rules
and auto-coded; write the rule for the next time. With Astapov's approach, you
can re-generate the journal from the CSV at any point in the future with an
updated set of import rules.
tracking dependencies
Now we get onto the job of driving the generation of all these derivative
journal files. Astapov has built a sophisticated system using Haskell's "Shake",
which I'm not yet familiar, but for my sins I'm quite adept at (GNU-flavoured)
UNIX Make, so I started building with that. An example rule
This captures the dependency between the journal and the underlying CSV
but also to the relevant rules file; if I modify that, and this target
is run in the future, all dependent journals should be re-generated.1
opening balances
It's all fine and well starting over in a new year, and I might be generous
to forgive debts, but I can't count on others to do the same. We need
to carry over some balance information from one year to the next. Astapov has
a more complex (or perhaps featureful) scheme for this involving a custom
Haskell program, but I bodged something with a pair of make targets:
I think this could be golfed into a year-generic rule with a little more work.
The nice thing about this approach is the opening balances for a given year
might change, if adjustments are made in prior years. They shouldn't, for
real accounts, but very well could for more "virtual" liabilities. (including:
deciding to write off debts.)
run lots of reports
Astapov advocates for running lots of reports, and automatically. There's a
really obvious advantage of that to me: there's no chance anyone except me
will actually interact with HLedger itself. For family finances, I need
reports to be able to discuss anything with my wife.
Extending my make rules to run reports is trivial. I've gone for HTML
reports for the most part, as they're the easiest on the eye. Unfortunately
the most useful report to discuss (at least at the moment) would be a list
of transactions in a given expense category, and the register/aregister
commands did not support HTML as an output format. I submitted my first
HLedger patch to add HTML output support to aregister:
https://github.com/simonmichael/hledger/pull/2000
addressing the virtual posting problem
I wrote in my original hledger blog post that I had to resort to
unbalanced virtual postings in order to record both a liability between
my personal cash and family, as well as categorise the spend. I still
haven't found a nice way around that.
But I suspect having broken out the journal into lots of other journals
paves the way to a better solution to the above.
The form of a solution I am thinking of is: some scheme whereby the two
destination accounts are combined together; perhaps, choose one as a primary
and encode the other information in sub-accounts under that. For example,
repeating the example from my hledger blog post:
(I note this is very similar to a solution proposed to me by someone
responding on twitter).
The next step is to recognise that sometimes when looking at the data I
care about one aspect, and at other times the other, but rarely both. So
for the case where I'm thinking about family finances, I could use
account aliases
to effectively flatten out the expense category portion and ignore it.
On the other hand, when I'm concerned about how I've spent my personal
cash and not about how much I owe the family account, I could use
aliases to do the opposite: rewrite-away the family:liabilities:jon
prefix and combine the transactions with the regular jon:expenses
account heirarchy.
(this is all speculative: I need to actually try this.)
catching errors after an import
When I import the transactions for a given real bank account, I check the
final balance against another source: usually a bank statement, to make
sure they agree. I wasn't using any of the myriad methods to make sure
that this remains true later on, and so there was the risk that I make an
edit to something and accidentally remove a transaction that contributed
to that number, and not notice (until the next import).
The CSV data my bank gives me for accounts (not for credit cards) also includes
a 'resulting balance' field. It was therefore trivial to extend the CSV import
rules to add balance
assertions to
the transactions that are generated. This catches the problem.
There are a couple of warts with balance assertions on every such
transaction: for example, dealing with the duplicate transaction for paying
a credit card: one from the bank statement, one from the credit card.
Removing one of the two is sufficient to correct the account balances but
sometimes they don't agree on the transaction date, or the transactions
within a given day are sorted slightly differently by HLedger than by the
bank. The simple solution is to just manually delete one or two assertions:
there remain plenty more for assurance.
going forward
I've only scratched the surface of the suggestions in Astapov's "full fledged
HLedger" notes. I'm up to step 2 of 14. I'm expecting to return to it once
the changes I've made have bedded in a little bit.
I suppose I could anonymize and share the framework (Makefile etc) that I am
using if anyone was interested. It would take some work, though, so I don't know
when I'd get around to it.
the rm latest bit is to clear up some state-tracking files that HLedger writes to avoid importing duplicate transactions. In this case, I know better than HLedger.
I ve always found the operation of apt software package repositories to be a mystery. There appears to be a lack of transparency into which people have access to important apt package repositories out there, how the automatic non-human update mechanism is implemented, and what changes are published. I m thinking of big distributions like Ubuntu and Debian, but also the free GNU/Linux distributions like Trisquel and PureOS that are derived from the more well-known distributions.
As far as I can tell, anyone who has the OpenPGP private key trusted by a apt-based GNU/Linux distribution can sign a modified Release/InRelease file and if my machine somehow downloads that version of the release file, my machine could be made to download and install packages that the distribution didn t intend me to install. Further, it seems that anyone who has access to the main HTTP server, or any of its mirrors, or is anywhere on the network between them and my machine (when plaintext HTTP is used), can either stall security updates on my machine (on a per-IP basis), or use it to send my machine (again, on a per-IP basis to avoid detection) a modified Release/InRelease file if they had been able to obtain the private signing key for the archive. These are mighty powers that warrant overview.
I ve always put off learning about the processes to protect the apt infrastructure, mentally filing it under so many people rely on this infrastructure that enough people are likely to have invested time reviewing and improving these processes . Simultaneous, I ve always followed the more free-software friendly Debian-derived distributions such as gNewSense and have run it on some machines. I ve never put them into serious production use, because the trust issues with their apt package repositories has been a big question mark for me. The enough people part of my rationale for deferring this is not convincing. Even the simple question of is someone updating the apt repository is not easy to understand on a running gNewSense system. At some point in time the gNewSense cron job to pull in security updates from Debian must have stopped working, and I wouldn t have had any good mechanism to notice that. Most likely it happened without any public announcement. I ve recently switched to Trisquel on production machines, and these questions has come back to haunt me.
The situation is unsatisfying and I looked into what could be done to improve it. I could try to understand who are the key people involved in each project, and may even learn what hardware component is used, or what software is involved to update and sign apt repositories. Is the server running non-free software? Proprietary BIOS or NIC firmware? Are the GnuPG private keys on disk? Smartcard? TPM? YubiKey? HSM? Where is the server co-located, and who has access to it? I tried to do a bit of this, and discovered things like Trisquel having a DSA1024 key in its default apt trust store (although for fairness, it seems that apt by default does not trust such signatures). However, I m not certain understanding this more would scale to securing my machines against attacks on this infrastructure. Even people with the best intentions, and the state of the art hardware and software, will have problems.
To increase my trust in Trisquel I set out to understand how it worked. To make it easier to sort out what the interesting parts of the Trisquel archive to audit further were, I created debdistdiff to produce human readable text output comparing one apt archive with another apt archive. There is a GitLab CI/CD cron job that runs this every day, producing output comparing Trisquel vs Ubuntu and PureOS vs Debian. Working with these output files has made me learn more about how the process works, and I even stumbled upon something that is likely a bug where Trisquel aramo was imported from Ubuntu jammy while it contained a couple of package (e.g., gcc-8, python3.9) that were removed for the final Ubuntu jammy release.
After working on auditing the Trisquel archive manually that way, I realized that whatever I could tell from comparing Trisquel with Ubuntu, it would only be something based on a current snapshot of the archives. Tomorrow it may look completely different. What felt necessary was to audit the differences of the Trisquel archive continously. I was quite happy to have developed debdistdiff for one purpose (comparing two different archives like Trisquel and Ubuntu) and discovered that the tool could be used for another purpose (comparing the Trisquel archive at two different points in time). At this time I realized that I needed a log of all different apt archive metadata to be able to produce an audit log of the differences in time for the archive. I create manually curated git-repositories with the Release/InRelease and the Packages files for each architecture/component of the well-known distributions Trisquel, Ubuntu, Debian and PureOS. Eventually I wrote scripts to automate this, which are now published in the debdistget project.
At this point, one of the early question about per-IP substitution of Release files were lingering in my mind. However with the tooling I now had available, coming up with a way to resolve this was simple! Merely have apt compute a SHA256 checksum of the just downloaded InRelease file, and see if my git repository had the same file. At this point I started reading the Apt source code, and now I had more doubts about the security of my systems than I ever had before. Oh boy how the name Apt has never before felt more Apt?! Oh well, we must leave some exercises for the students. Eventually I realized I wanted to touch as little of apt code basis as possible, and noticed the SigVerify::CopyAndVerify function called ExecGPGV which called apt-key verify which called GnuPG s gpgv. By setting Apt::Key::gpgvcommand I could get apt-key verify to call another tool than gpgv. See where I m going? I thought wrapping this up would now be trivial but for some reason the hash checksum I computed locally never matched what was on my server. I gave up and started working on other things instead.
Today I came back to this idea, and started to debug exactly how the local files looked that I got from apt and how they differed from what I had in my git repositories, that came straight from the apt archives. Eventually I traced this back to SplitClearSignedFile which takes an InRelease file and splits it into two files, probably mimicking the (old?) way of distributing both Release and Release.gpg. So the clearsigned InRelease file is split into one cleartext file (similar to the Release file) and one OpenPGP signature file (similar to the Release.gpg file). But why didn t the cleartext variant of the InRelease file hash to the same value as the hash of the Release file? Sadly they differ by the final newline.
Having solved this technicality, wrapping the pieces up was easy, and I came up with a project apt-canary that provides a script apt-canary-gpgv that verify the local apt release files against something I call a apt canary witness file stored at a URL somewhere.
I m now running apt-canary on my Trisquel aramo laptop, a Trisquel nabia server, and Talos II ppc64el Debian machine. This means I have solved the per-IP substitution worries (or at least made them less likely to occur, having to send the same malicious release files to both GitLab and my system), and allow me to have an audit log of all release files that I actually use for installing and downloading packages.
What do you think? There are clearly a lot of work and improvements to be made. This is a proof-of-concept implementation of an idea, but instead of refining it until perfection and delaying feedback, I wanted to publish this to get others to think about the problems and various ways to resolve them.
Btw, I m going to be at FOSDEM 23 this weekend, helping to manage the Security Devroom. Catch me if you want to chat about this or other things. Happy Hacking!
Just days after a build-fix
release (for aarch64) and still only a few weeks after the 0.2.0
release of RcppTOML
and its switch to toml++, we have
another bugfix release 0.2.2 on CRAN also bringing release 3.3.0 of
toml++ (even if we
had large chunks of 3.3.0 already incorporated).
TOML is a file format that is most
suitable for configurations, as it is meant to be edited by
humans but read by computers. It emphasizes strong readability
for humans while at the same time supporting strong typing
as well as immediate and clear error reports. On small typos
you get parse errors, rather than silently corrupted garbage. Much
preferable to any and all of XML, JSON or YAML though sadly these may
be too ubiquitous now. TOML is
frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages )
for the Rust
language.
The package was building fine on Intel-based macOS provided the
versions were recent enough. CRAN, however, aims for the
broadest possibly reach of binaries and builds on a fairly ancient macOS
10.13 with clang version 10. This confused toml++ into (wrongly)
concluding it could not build when it in fact can. After a hint from Simon that Apple in their infinite
wisdom redefines clang version ids, this has been reflected in version
3.3.0 of toml++ by
Mark so we should now build
everywhere. Big thanks to everybody for the help.
The short summary of changes follows.
Changes in version 0.2.2
(2023-01-29)
New toml++ version 3.3.0 with fix to permit
compilation on ancient macOS systems as used by CRAN for the Intel-based
builds.
Ever wondered how Trisquel and Ubuntu differs and what s behind the curtain from a developer perspective? I have. Sharing what I ve learnt will allow you to increase knowledge and trust in Trisquel too.
The scripts to convert an Ubuntu archive into a Trisquel archive are available in the ubuntu-purge repository. The easy to read purge-focal script lists the packages to remove from Ubuntu 20.04 Focal when it is imported into Trisquel 10.0 Nabia. The purge-jammy script provides the same for Ubuntu 22.04 Jammy and (the not yet released) Trisquel 11.0 Aramo. The list of packages is interesting, and by researching the reasons for each exclusion you can learn a lot about different attitudes towards free software and understand the desire to improve matters. I wish there were a wiki-page that for each removed package summarized relevant links to earlier discussions. At the end of the script there is a bunch of packages that are removed for branding purposes that are less interesting to review.
Trisquel adds a couple of Trisquel-specific packages. The source code for these packages are in the trisquel-packages repository, with sub-directories for each release: see 10.0/ for Nabia and 11.0/ for Aramo. These packages appears to be mostly for branding purposes.
Trisquel modify a set of packages, and here is starts to get interesting. Probably the most important package to modify is to use GNU Linux-libre instead of Linux as the kernel. The scripts to modify packages are in the package-helpers repository. The relevant scripts are in the helpers/ sub-directory. There is a branch for each Trisquel release, see helpers/ for Nabia and helpers/ for Aramo. To see how Linux is replaced with Linux-libre you can read the make-linux script.
This covers the basic of approaching Trisquel from a developers perspective. As a user, I have identified some areas that need more work to improve trust in Trisquel:
Auditing the Trisquel archive to confirm that the intended changes covered above are the only changes that are published.
Rebuild all packages that were added or modified by Trisquel and publish diffoscope output comparing them to what s in the Trisquel archive. The goal would be to have reproducible builds of all Trisquel-related packages.
Publish an audit log of the Trisquel archive to allow auditing of what packages are published. This boils down to trust of the OpenPGP key used to sign the Trisquel archive.
Trisquel archive mirror auditing to confirm that they are publishing only what comes from the official archive, and that they do so timely.
I hope to publish more about my work into these areas. Hopefully this will inspire similar efforts in related distributions like PureOS and the upstream distributions Ubuntu and Debian.
Happy hacking!
We plan to spend three days continuing to the grow of the Reproducible Builds effort. As in previous events, the exact content of the meeting will be shaped by the participants. And, as mentioned in Holger Levsen s post to our mailing list, the dates have been booked and confirmed with the venue, so if you are considering attending, please reserve these dates in your calendar today.
R my Gr nblatt, an associate professor in the T l com Sud-Paris engineering school wrote up his pain points of using Nix and NixOS. Although some of the points do not touch on reproducible builds, R my touches on problems he has encountered with the different kinds of reproducibility that these distributions appear to promise including configuration files affecting the behaviour of systems, the fragility of upstream sources as well as the conventional idea of binary reproducibility.
Morten Linderud reported that he is quietly optimistic that if Go programming language resolves all of its issues with reproducible builds (tracking issue) then the Go binaries distributed from Google and by Arch Linux may be bit-for-bit identical. It s just a bit early to sorta figure out what roadblocks there are. [But] Go bootstraps itself every build, so in theory I think it should be possible.
On December 15th, Holger Levsen published an in-depth interview he performed with David A. Wheeler on supply-chain security and reproducible builds, but it also touches on the biggest challenges in computing as well.
This is part of a larger series of posts featuring the projects, companies and individuals who support the Reproducible Builds project. Other instalments include an article featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project.
A number of changes were made to the Reproducible Builds website and documentation this month, including FC Stegerman adding an F-Droid/apksigcopier example to our embedded signatures page [], Holger Levsen making a large number of changes related to the 2022 summit in Venice as well as 2023 s summit in Hamburg [][][][] and Simon Butler updated our publications page [][].
On our mailing list this month, James Addison asked a question about whether there has been any effort to trace the files used by a build system in order to identify the corresponding build-dependency packages. [] In addition, Bernhard M. Wiedemann then posed a thought-provoking question asking How to talk to skeptics? , which was occasioned by a colleague who had published a blog post in May 2021 skeptical of reproducible builds. The thread generated a number of replies.
Android news
obfusk (FC Stegerman) performed a thought-provoking review of tools designed to determine the difference between two different .apk files shipped by a number of free-software instant messenger applications.
These scripts are often necessary in the Android/APK ecosystem due to these files containing embedded signatures so the conventional bit-for-bit comparison cannot be used. After detailing a litany of issues with these tools, they come to the conclusion that:
It s quite possible these messengers actually have reproducible builds, but the verification scripts they use don t actually allow us to verify whether they do.
This reflects the consensus view within the Reproducible Builds project: pursuing a situation in language or package ecosystems where binaries are bit-for-bit identical (over requiring a bespoke ecosystem-specific tool) is not a luxury demanded by purist engineers, but rather the only practical way to demonstrate reproducibility. obfusk also announced the first release of their own set of tools on our mailing list.
Related to this, obfusk also posted to an issue filed against Mastodon regarding the difficulties of creating bit-by-bit identical APKs, especially with respect to copying v2/v3 APK signatures created by different tools; they also reported that some APK ordering differences were not caused by building on macOS after all, but by using Android Studio [] and that F-Droid added 16 more apps published with Reproducible Builds in December.
Debian
As mentioned in last months report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads).
During December, meetings were held on the 1st, 8th, 15th, 22nd and 29th, resulting in a large number of uploads and bugs being addressed:
Upstream patches
The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
Testing framework
The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
The osuosl167 machine is no longer a openqa-worker node anymore. [][]
Detect problems with APT repository signatures [] and update a repository signing key [].
Only install the foot-terminfo package on Debian systems. []
In addition, Mattia Rizzolo added support for the version of diffoscope in Debian stretch which doesn t support the --timeout flag. [][]
diffoscopediffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 228, 229 and 230 to Debian:
Fix compatibility with file(1) version 5.43, with thanks to Christoph Biedl. []
Skip the test_html.py::test_diff test if html2text is not installed. (#1026034)
Update copyright years. []
In addition, Jelle van der Waa added support for Berkeley DB version 6. []
Orthogonal to this, Holger Levsen bumped the Debian Standards-Version on all of our packages, including diffoscope [], strip-nondeterminism [], disorderfs [] and reprotest [].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via: